From Machine Readable Dictionaries to Lexical Databases:

نویسندگان

  • TOMAŽ ERJAVEC
  • ROGER EVANS
  • NANCY IDE
  • ADAM KILGARRIFF
چکیده

It is commonly held that machine-readable dictionaries play a key role in bootstrapping effective wide-coverage language-technology, especially in less well-resourced languages. However, while the linguistic knowledge they contain is clearly necessary for this goal, it is far from clear that the format it is presented in is sufficient to reach it. A crucial step in the deployment of such resources is to map them into lexical databases with standardised and well-understood structure and semantics. Furthermore, considerable additional benefits are obtained if such structure and semantics are shared with other linguistic resources. Achieving such a goal, however, is often not an easy task. This paper describes how such a mapping was carried out in the CONCEDE project, for six Central and Eastern European Languages (Bulgarian, Czech, Estonian, Hungarian, Romanian, and Slovene) for which few wide-coverage lexical resources had previously been available. In a two-stage process, the machine-readable data for each language was first mapped into broadly compatible, TEI-compliant SGML representations, and then these representations were harmonised into a single XML scheme. The resulting framework offers a concise, flexible lexical database specification, with a demonstrable ability to cope with a diverse range of dictionary and language requirements, and lexical resources suitable for monolingual and multilingual application.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Assessment Of Semantic Information Automatically Extracted From Machine Readable Dictionaries

In this paper we provide a quantitative evaluation of information automatically extracted from machine readable dictionaries. Our results show that for any one dictionary, 55-70% of the extracted information is garbled in some way. However, we show that these results can be dramatically reduced to about 6% by combining the information extracted from five dictionaries. It therefore appears that ...

متن کامل

ITRI-03-03 From Machine Readable Dictionaries to Lexical Databases: the CONCEDE Experience

It is commonly held that machine-readable dictionaries play a key role in bootstrapping effective wide-coverage language-technology, especially in less well-resourced languages. However, while the linguistic knowledge they contain is clearly necessary for this goal, it is far from clear that the format it is presented in is sufficient to reach it. A crucial step in the deployment of such resour...

متن کامل

An Approach to Building the Hierarchical Element of a Lexical Knowledge Base from a Machine Readable Dictionary. an Approach to Building the Hierarchical Element of a Lexical Knowledge Base from a Machine Readable Dictionary 1

This abstract describes an approach to extracting taxonomies from machine readable dictionaries and using them to structure a lexical knowledge base which incorporates default inheritance. Taxonomy construction is based on an intuitive notion of the organisation of the substantial quantities of data in machine readable dictionaries which were developed for quite independent purposes. Our intent...

متن کامل

Machine-Readable Dictionaries

The papers in this panel consider machine-readable dictionaries from several perspectives: research in computational linguistics and computational lexicology, the development of tools for improving accessibility, the design of lexical reference systems for educational purposes, and applications of machine-readable dictionaries in information science contexts. As background and by way of introdu...

متن کامل

Multilingual Aspects of Multiword Lexical Units

As most of the machine-readable dictionaries contain clearly insufficient information about multiword lexical units, there is a constant need to extend and tune specialized lexical databases to account for new expressions. In this paper, we present a system exclusively based on statistics that massively extracts from unrestricted text corpora contiguous and noncontiguous rigid multiword lexical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003